Image-based Word Recognition in Oriental Language Document Images
نویسندگان
چکیده
An algorithm for word recognition in Oriental languages such as Chinese, Japanese, and Korean is presented. The objective is to recognize words. that are composed of a number of consecutive characters. in document images where there are no explicit visually defined word boundaries. The technique exploits the redundancy in these languages that is expressed by the difference between the number of possible character strings of a fixed length and the number of legal words of that length. Sequences of character images are matched simultaneously to lists of legal words and illegal strings that are likely to occur. A word is located if its image is more likely to occur in the current context than any of the illegal strings that are visually similar to it. No intermediate character recognition step is used. The application of contextual information directly to the interpretation of features extracted from the image overcomes noise that could have made isolated character recognition impossible and the location of words with conventional postprocessing algorithms difficult. Experimental results are presented that show the ability of this algorithm to correctly recognize text in the presence of noise.
منابع مشابه
Image-based keyword recognition in oriental language document images
-An algorithm is presented for keyword recognition in Oriental language document images. The objective is to recognize keywords composed of more than one consecutive character in document images where there are no explicit visually defined word boundaries. The technique exploits the redundancy expressed by the difference between the number of possible character strings of a fixed length and the...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملRetrieval of machine-printed Latin documents through Word Shape Coding
This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a word shape coding scheme is proposed, which converts each word image into a word shape code by using a few shape features. The text contents of imaged documents are thus captured by a document vector constructed with the...
متن کاملA Statistical Approach to Retrieving Historical Manuscript Images without Recognition
Handwritten historical document collections in libraries and other areas are often of interest to researchers, students or the general public. Convenient access to such corpora generally requires an index, which allows one to locate individual text units (pages, sentences, lines) that are relevant to a given query (usually provided as text). Several solutions are possible: manual annotation (ve...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998